Improving voice activity detection in movies

نویسندگان

  • Bernhard Lehner
  • Gerhard Widmer
  • Reinhard Sonnleitner
چکیده

Voice Activity Detection in movies is a non-trivial and challenging task. The different emotional states of the speakers, as well as the variety of soundscapes and noises contribute to the complexity of the task. In this paper, we propose a set of lightweight features that are specifically designed to perform under such conditions, while at the same time preventing confusions of singing voice with speech. For evaluation, we use four fulllength movies, previously unseen to the system and painstakingly annotated. We compare our detector to a state-of-the-art reference system. The new approach performs better, yielding just about half the Equal Error Rate (EER). Furthermore, since the ground truth annotation task is extremely tedious, and to help with advancing in this topic, we release the annotations of all four movies to the research community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

Interactive Voice Modifiable 3D Dynamic Object Based Movies over the Internet

In this paper, we describe an XML based 3D-Voice Enabled Single Transmission Multiple Display Multimedia Language (3D-VE STMDML) model and its implementation. We describe 3D-scenes, mesh based 3D objects, integration of 2D and 3D objects, object animations with collision avoidance, and voice based animation control. This model has been used to download object based movies over the Internet, dyn...

متن کامل

Towards improving statistical model based voice activity detection

Statistical model based voice activity detection (VAD) is commonly used in various speech related research and applications. In this paper, we try to improve the performance of statistical model based VAD via new feature extraction method. Our main innovation focuses on that we apply Mel-frequency subband coefficients with power-law nonlinearity as feature for statistical model based VAD instea...

متن کامل

ON IMPROVING VOICE ACTIVITY DETECTION BY FUZZY LOGIC RULES : CASE OF COHERENCE BASED FEATURES (WedAmOR6)

In this paper, we investigate the use of fuzzy logic for Voice Activity Detection (VAD). The feature extraction part is based on coherence measure between the noisy speech and its prediction residue. The decision part uses fuzzy logic rules instead of classical thresholding tools. Different fuzzy logic models are developed in order to track noise characteristics. The performances of the algorit...

متن کامل

Voice Activity Detection Using Higher Order Statistics

A robust and effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra by means of third order autocumulants. This algorithm differs from many others in the way the decisi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015